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Abstract 

An  automatic  parser  generator  is  a tool  for  quickly  implement- 
ing programming  language  parsers.  Parser  generators  based  upon  LR 
parsing  have  been  built  for  grammars  satisfying  the  LR(0),  SLR(l), 
and  LALR(l)  properties.  Speed  of  the  resulting  parser  is  comparable 
to  that  of  a hand  coded  recursive  descent  parser. 

DAVE,  an  automatic  program  testing  aid,  requires  a flexible,  easy- 
to-implement  parser.  This  report  presents  an  LALR(l)  grammar  for  ANSI 
standard  FORTRAN,  suitable  as  input  to  an  automatic  parser  generator. 

Its  use  in  building  DAVE  provides  a measure  of  the  desired  flexibility, 
since  new  parsers  for  FORTRAN  dialects  may  be  produced  by  simply  modify- 
ing the  existing  grammar. 

A powerful  meta-language  is  used  to  describe  the  grammar.  Its 
features  are  summarized,  including  the  method  for  specifying  automatic 
construction  of  (intermediate-text)  structure  trees  during  parsing. 

The  report  concludes  with  a discussion  of  some  of  the  more  important 
decisions  made  during  development  of  the  grammar. 
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I.  Introduction 

Context-free  grammars  are  widely  recognized  as  appropriate  tools  for 
describing  the  syntax  of  programming  languages.  Their  formality  has  allowed 
the  language  designer  to  communicate  precisely  and  unambiguously  his  in- 
tended structure,  and  more  recently  has  allowed  the  language  implementor 
to  automatically  generate  the  parsing  phase  of  his  compiler. 

FORTRAN  was  developed  before  the  usefulness  of  grammars  was  fully 
appreciated.  Its  standard  document  [1]  uses  English  prose  to  communicate 
syntactic  structure.  Since  FORTRAN  has  already  been  widely  implemented,  a 
FORTRAN  grammar  might  appear  to  be  of  little  practical  use  today. 

Recent  interest  in  the  development  of  software  validation  tools, 
however,  has  kept  the  market  for  efficient,  easy-to-generate  FORTRAN  parsers 
very  much  alive.  The  DAVE  project  [2],  currently  under  improvement  at  the 
University  of  Colorado,  is  an  example. 

DAVE  is  an  automatic  program  testing  aid  which  performs  a static 
analysis  of  programs  written  in  ANSI  standard  FORTRAN.  Experience  with 
DAVE  has  uncovered  a sizeable  demand  for  diagnostic  aids  capable  of 
analyzing  FORTRAN  "dialects"  as  well.  One  solution  is  to  provide  a flex- 
ible tool  which  may  be  easily  converted  to  any  of  the  FORTRAN  variants. 

The  use  of  an  automatic  parser  generator  provides  a step  toward  the  desired 
flexibility,  since  new  parsers  may  be  produced  by  simply  modifying  a basic 
grammar. 

The  purpose  of  this  report  is  to  present  a FORTRAN  gramnar  which: 

1)  captures  the  structure  of  ANSI  standard  FORTRAN  at  the  parsing 
level , and 

2)  satisfies  the  LALR(l)  property,  a condition  required  by  the 
BOBSW  parser  generator  system. 

Details  of  the  exact  grammar  requirements  of  the  BOBSW  system,  and  its  use 
in  producing  a parser  for  the  DAVE  project,  may  be  found  in  [3]. 

It  is  assumed  that  the  reader  is  familiar  with  grammars,  their  rela- 
tion to  programming  languages,  and  the  parsing  process.  A good  elementary 
treatment  may  be  found  in  Cries  [4].  Hopcroft  and  Ullman  [5]  provide  a 
more  theoretical  approach  to  grammars  and  their  properties.  The  operation 
of  a parser  generator  based  upon  the  LALR(l)  property  is  described  in 
LaLonde  [6]. 


Section  II  contains  a listing  of  the  FORTRAN  parser  grammar  and  a 
review  of  the  meta-language  used  to  describe  it.  Section  III  concludes 
with  a discussion  of  some  important  decisions  made  during  development  of 
the  grammar. 
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II.  The  Grammar 

Keywords  and  special  symbols  belonging  to  the  meta-language  used 
in  this  report  are  shown  in  Figure  1.  Although  keywords  will  appear 
underlined  in  the  grammar  listing  to  follow,  they  are  actually  reserved 
and  may  not  be  used  as  nonterminal  symbols. 

The  meta-language  has  been  designed  to  accept  nested  "sub-grammar" 
definitions.  To  facilitate  machine  checks  on  proper  nesting,  each  grammar 
is  delimited  by  a pair  of  keywords: 

parser  Fortran_compilation_unit: 
end  Fortran_compilation_unit 

The  name  following  parser  identifies  the  goal  symbol  of  the  grammar,  and 
must  exactly  match  the  name  following  end.  A terminating  colon  on  the 
parser  line  allows  the  grammar  writer  to  optionally  omit  the  preceeding 
name  (it  may  never  be  omitted  from  the  end  line).  In  this  case,  the 
first  production  of  the  grammar  serves  to  identify  its  goal  symbol. 

Productions  are  unordered  and  written  in  free  form  syntax.  A 
sharp  (#),,  which  may  appear  anywhere  in  the  grammar,  indicates  that  the 
remaining  portion  of  the  current  line  is  to  be  treated  as  a comment. 

Non-terminal  symbols  are  written  as  a sequence  of  one  or  more  alpha- 
betic, numeric,  or  underbar  characters,  beginning  with  an  alphabetic. 
Terminal  symbols  are  delimited  by  single  quotes,  and  may  consist  of  any 
sequence  of  printable  characters  except  the  single  quote  and  blank. 

The  terminal  symbols  of  a parser  grammar  correspond  to  tokens  re- 
ceived from  a scanner  module  at  parse  time.  Two  kinds  of  tokens  can  be 
identified.  The  first  kind  may  be  described  as  representing  entities 
having  a unique  form  in  the  source  language.  A complete  list  of  such 
tokens  for  FORTRAN,  representing  keywords  and  special  characters,  is 
given  in  Figure  2.  A careful  inspection  of  this  list  will  reveal  the 
absence  of  several  FORTRAN  keywords  and  the  addition  of  several  new 
ones.  A discussion  of  issues  relating  to  these  anomalies  is  deferred 
to  Section  III. 
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Figure  2 

Complete  list  of  terminal  symbols 
(tokens)  which  represent  FORTRAN 
keywords  and  special  characters. 
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The  second  kind  of  token  represents  entities  which  do  not  have  a 
unique  representation  in  the  source  language.  For  example,  a given  pro- 
gram may  contain  many  different  integer  constants.  When  the  scanner 
module  returns  a token  of  type  "integer  constant",  it  must  also  include 
some  subrosa  information  indicating  the  exact  integer  chosen  by  the 
programmer.  Figure  3 gives  a complete  list  of  all  such  tokens  as  they 
will  appear  in  the  FORTRAN  grammar  of  this  report.  Sample  subrosa 
information  is  shown  for  each  token.  Notice  that  angle  brackets  are 
used  to  distinguish  tokens  requiring  subrosa  information  from  the 
simple  tokens  of  Figure  2. 

The  following  example  illustrates  how  a production  may  be  written 
in  its  most  basic  form. 

Slash  ->  7' ; 

The  production  separator  symbol  (->)  is  proceeded  by  a single  nonterminal 
and  followed  by  a sequence  of  zero  or  more  terminals,  nonterminals,  and 
meta-symbols.  A semicolon  terminates  the  production. 

Very  often  a nonterminal  symbol  will  appear  as  the  left  hand  side  of 
more  than  one  production.  There  are  two  (equivalent)  ways  of  conveniently 
grouping  such  productions  together: 

Field 

->  Basic_field 

->  Groupl; 
and 

Field 

->  Basic_field iGroupl ; 

The  vertical  bar  indicates  alternation,  and  may  also  appear  with  parentheses 
to  effect  a kind  of  "distributive"  property.  For  example, 

ProgramjLinit 

->  (Subprogram I Program_body)  'END'; 

is  equivalent  to 

Program_unit 

->  Subprogram  'END' 

->  Program_body  'END'; 
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Complete  list  of  terminal 
symbols  (tokens)  which  require 
associated  subrosa  information. 
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Two  meta-symbols  have  been  included  to  more  conveniently  describe 
the  concept  of  repetition.  A trailing  plus  character  (+)  indicates  "one 
or  more"  of  the  entity  which  preceeds  it.  For  example, 

Sep  ->  Slash  +; 

is  written  to  express  the  fact  that  a separator  may  consist  of  one  or 
more  slashes.  This  same  concept  may  be  described  without  the  plus,  but 
requires  one  additional  production; 

Sep 

->  Slash 
->  Sep  Slash; 

Similarly,  a trailing  asterisk  (*)  indicates  "zero  or  more"  of  the  entity 
which  preceeds  it. 

Keywords  1 ist  and  rl ist  have  been  included  to  more  conveniently 
capture  the  syntax  of  ordinary  lists  of  objects.  For  example,  the 
production 

Ext 

->  'EXTERNAL'  '<name>'  list  ','  ; 

is  written  to  indicate  that  a FORTRAN  external  statement  must  contain  the 
word  EXTERNAL  followed  by  a list  of  one  or  more  names  separated  by  commas. 
Expressing  the  list  concept  without  a special  keyword  requires  an  addi- 
tional nonterminal  and  two  more  productions: 

Ext 

->  'EXTERNAL'  Namejist; 

Name_l  ist 
->  '<name>' 

->  Name_list  ','  '<name>'; 

The  keyword  rl ist  is  distinguished  from  list  only  by  the  fact  that 
its  elimination  results  in  a right  recursive  expansion  instead  of  a left 
recursive  one: 

Ext 

->  'EXTERNAL'  '<name>'  rl ist  ','  ; 


i 


expands  to 


Ext 

->  'EXTERNAL'  Namejist; 

Name_l  ist 

->  '<nanie>' 

->  '<name>'  ','  Name_list; 

Right  recursion  is  sometimes  necessary  to  achieve  the  LALR(l)  property, 
as  will  be  demonstrated  in  section  III. 

The  basic  activity  of  a programming  language  parser  is  to  discover 
the  structure  of  an  input  program  and  to  verify  that  the  syntactic  rules 
of  the  language  have  not  been  violated.  In  many  cases,  the  parser  also 
transforms  source  code  into  a suitable  intermediate  form  so  that  later 
processing  is  made  easier.  One  possibility  is  conversion  to  a structure 
tree,  where  relationships  among  the  syntactic  units  of  a program  are 
represented  in  tree  form.  For  example,  the  FORTRAN  assignment  statement 

A=B+C 

could  be  represented  in  intermediate  form  by  the  structure  tree  shown  in 
Figure  4. 

The  meta-language  used  here  contains  mechanisms  which  allow  the 
grammar  writer  to  specify  tree  building  activities.  A brief  summary  of 
LR  parsing  is  given  to  help  explain  how  tree  construction  may  be 
combined  with  the  parsing  process. 

LR  parsing  may  be  viewed  as  a sequence  of  read  and  reduce  actions. 
During  a read  action,  the  parser  requests  the  next  input  token  from  the 
scanner  module  and,  depending  upon  its  current  state  and  the  token  re- 
ceived, moves  to  a next  state.  The  new  token  is  pushed  onto  a parse 
stack,  where  a summary  of  the  "already  seen"  portion  of  source  text  is 
maintained. 

Reduce  actions  become  possible  whenever  the  top  symbol (s)  of  the  parse 
stack  exactly  match  the  symbol (s)  on  the  right  hand  side  of  a grammar  pro- 
duction. (For  LR(1)  parsing,  the  appropriateness  of  a reduction  may  be 
determined  by  looking  no  more  than  one  token  ahead  in  the  input  stream.) 
During  a reduce  action,  the  matching  symbols  are  removed  from  the  stack  and 
replaced  by  the  single  nonterminal  which  appears  on  the  left  hand  side  of 


Figure  4 

Structure  tree  corresponding  to  the  FORTRAN 
assignment  statement  A=B+c.  Note  that  the 
variable  names  are  actually  sub-rosa  infor- 
mation attached  to  <name>  leaves,  and  are 
not  considered  nodes  of  the  tree. 
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that  production,  and  a new  state  is  entered.  The  objective  is  to  continue 
with  read  and  reduce  actions  until  there  are  no  more  input  tokens  to  read 
and  only  the  goal  symbol  of  the  grammar  remains  on  the  parse  stack. 

Tree  construction  is  carried  out  during  the  reduce  actions  of  parsing. 
An  additional  stack,  called  the  tree  node  stack,  is  added  to  facilitate 
the  linking  of  nodes  into  a tree.  The  exact  process  is  best  described  by 
an  example. 

Suppose  the  grammar  writer  would  like  to  specify  the  construction  of 
structure  trees  for  assignment  statements.  For  example,  he  would  like  a 
parse  of  A=B+C  to  result  in  the  tree  of  Figure  4.  Assume  for  now  that 
just  two  grammar  productions  are  needed  to  describe  assignment  statements: 

Basic_stmt  ->  '<name>'  Expression; 

Expression  ->  '<name>'  '+'  '<name>'; 

His  major  task  will  be  to  imagine  how  such  statements  will  be  parsed, 
and  to  identify  the  order  of  the  various  reduce  actions  that  will  take 
place.  To  illustrate,  a parse  of  A=B+C  is  described. 

First,  the  parser  receives  a <name>  token  from  the  scanner,  with  "A" 
included  as  subrosa  information.  Since  this  activity  is  a read  action, 
the  <name>  token  is  pushed  onto  the  parse  stack.  Whenever  the  parser's 
tree-building  option  is  turned  ON,  receipt  of  a terminal  symbol  delimited 
by  angle  brackets  will  also  result  in  the  creation  of  a corresponding  tree 
node.  This  new  node  is  then  pushed  onto  the  tree  node  stack  as  shown  in 
Figure  5(a). 

Next,  the  scanner  supplies  a token  representing  the  FORTRAN  equals 
sign.  Although  this  token  participates  in  parse  stack  activities,  it  does 
not  result  in  creation  of  a new  tree  node  since  surrounding  angle  brackets 
are  not  present  in  the  production  for  Basic_stmt. 

Receipt  of  the  next  <name>  token,  corresponding  to  variable  B,  results 
in  actions  identical  to  those  for  variable  A.  The  modified  tree  node 
stack  is  shown  in  Figure  5(b).  Note  that  the  parse  is  now  "following"  the 
production  for  Expression. 

The  next  token,  representing  a FORTRAN  plus  symbol,  results  only  in 
parse  stack  activities  (why?).  Finally,  a <name>  token  corresponding  to 
variable  C is  read,  resulting  in  the  tree  node  stack  of  Figure  5(c). 


<name> 


<name> 


<name> 


It  now  happens  that  the  top  three  symbols  of  the  parse  stack  exactly 
match  the  symbols  on  the  right  hand  side  of  the  production  for  Expression. 
A reduce  action  involving  this  production  is  therefore  indicated,  and 
carried  out.  The  grammar  writer  may  specify  that  tree  building  activities 
should  also  be  performed  at  this  time  by  augmenting  his  grammar  production 
with  a double  right  arrow  meta-symbol  (=>). 

For  example,  if  he  writes 
Expression 

->  '<name>'  '+'  '<name>'  =>  'plus'; 

then  during  any  reduce  action  involving  that  production,  a new  tree  node 
is  created  and  labeled  "plus".  This  node  is  then  automatically  linked 
into  the  existing  tree  structure  by  means  of  the  following  actions: 

1)  The  two  <name>  nodes  at  the  top  of  the  tree  node  stack,  corres- 
ponding to  the  two  <name>  nodes  on  the  right  hand  side  of  the 
production  for  Expression,  are  linked  as  sons  of  the  new  "plus" 
node. 

2)  The  sons  are  then  popped  from  the  tree  node  stack  and  replaced 
by  their  parent. 

The  result  of  these  actions  for  the  current  example  is  shown  in 
Figure  5(d). 

The  parsing  process  continues  with  a reduce  action  involving  the 
production  for  Basic_stmt.  The  fact  that  more  tree  building  is  desired 
may  be  indicated  by  writing 

Basic_stmt 

->  '<name>'  '='  Expression  =>  'becomes'; 

In  this  case,  a new  "becomes"  node  is  created  and  linked  into  the  exist- 
ing tree  structure  as  shown  in  Figure  5(e).  Notice  that  the  resulting 
tree  is  identical  to  the  one  shown  in  Figure  4,  and  has  been  created 
"bottom  up". 

During  linking  of  the  "becomes"  node,  the  Expression  nonterminal  in 
the  production  for  Basic_stmt  corresponds  to  a single  node  on  the  tree 
node  stack  (specifically,  the  "plus"  node).  It  should  be  noted  that 
nonterminals  which  preceed  a + or  * repetition  meta-symbol  may  correspond 


to  more  than  one  stack  entry.  An  automatic  counting  mechanism  is 
provided  to  handle  these  cases. 

This  completes  the  review  of  meta-language  features.  A listing 
of  the  FORTRAN  parser  grammar  follows.  Although  construction  of  an 
intermediate  structure  tree  has  been  completely  specified  by  means 
of  the  double  right  arrow,  the  reader  may  wish  to  consult  Appendix  A 
for  a more  graphic  description  of  tree  shape. 
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parser  Fortran_compilation_unit: 
#Overan  program  structure: 
Fortran_compi 1 ati onjjni t 


->  Programjunit  + 

= > 

'compile' ; 

Program jjnit 

->  '<labe1>'  Subprogram  'END' 

->  (Subprogram! Program_body)  'END' 

= > 

'labeled' 

->  '<label>'  'BLOCKDATA'  'EOS'  B1ockdata_stmts 
->  'BLOCKDATA'  'EOS'  Blockdata_stmts; 

=> 

'labeled' 

Blockdata_stmts 

->  Specification*  Data_stmt*  'END' 

Subprogram 

->  'SUBROUTINE'  '<name>'  Subrtn_parameters 

'EOS' 

=> 

'blockdata' ; 

Program_body 

->  Rtrn_type  'FUNCTION'  '<name>'  Parameter_ 

_list  'EOS' 

= > 

'subroutine' 

Program_body 

Subrtnjjarameters 
->  Parameter_l  ist 

-> 

'function' ; 

-> 

= > 

' parameters ' ; 

Parameter_l  i st 

->  '{'  ('<name>'  list  ',')  ')' 

Rtrn_type 
->  Type 

= > 

' parameters ' ; 

-> 

= > 

'default' ; 

Type 

->  'INTEGER' 

= > 

' integer' 

->  'REAL' 

= > 

'real' 

->  'DOUBLEPRECISION' 

= > 

'doubleprecision 

->  'COMPLEX' 

= > 

' compl ex ' 

->  'LOGICAL' 

= > 

'logical ' ; 

Program_body 

->  Body  groupl*  Body  group2  Body_group3* 

= > 

'body'; 

J 
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Bodyjgroupl 
->  Specification 
->  Externaljstmt 
->  Forniat_stmt; 

Body_group2 
->  Executable_stnit 
->  Function_or_array; 

BodyjgroupS 
->  Executable_stmt 
->  Function_or_array 
->  Forma t_stmt 
->  Data  stmt; 


I 


# FORTRAN  declarations;  I 

Specification  I 


->  Spec  'EOS'  ->  ’<label>'  Spec  'EOS' 

=>  'labeled'; 

Spec 

->  'DIMENSION'  (Array  del n list  ',') 

->  'COMMON'  Com_blockl  Com_block_rest* 

->  'EQUIVALENCE'  (Equiv  list  list  ',') 

->  Type  (Dcln  element  list 

=>  'dimension' 

=>  'common' 

=>  'equivalence' 

=>  'declaration'; 

Array_dcln 

->  '<name>'  '('  Subserjist  ')'  Type_placeholder 

=>  'array'; 

Subscr_list 
->  Integer 

->  Integer  ','  Integer 
->  Integer  ','  Integer  Integer 

=> 

=> 

=> 

Integer 

->  '<Iconst>'  ->  '<name>';  # Integer  variable 

Type_placeholder 

-> 

=>  'default'; 

Com_blockl 

- > Com_n  ame 1 Del n_l  i s t 

=>  'block'; 

Comjiamel 

->  '/'  ‘<name>'  '/' 

->  (7'  7’  1 ) 

=>  'blank'; 

Com_block_rest 

- > Com_name_res  t Dc  1 n_l  i s t 

=>  'block'; 

Com_name_rest 
->  '/'  '<name>'  '/’ 

->  7'  7' 

=>  'blank'; 

Dcln_list 

->  Common  dcln  element  list 

II 

V 

Comnion_dc  1 n_e1  ement 

->  ( ' <nanie> ' | Common-array) ; 
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Common_array 

->  '<name>'  '('  Iconst_list  ')'  Typejalaceholder  =>  'array'; 

Equiv_list 

->  '('  Declarator  (Declarator  list  ' . ' ) ')'  =>  'share'; 

Declarator 
->  '<name>' 

->  '<name>'  '('  Iconst_list  ')'  =>  'element'; 

Iconst_l  ist 

->  '<Iconst>'  => 

->  '<Iconst>'  ','  '<Iconst>'  => 

->  '<Iconst>'  '<Iconst>'  ','  '<Iconst>'  => 

Del n_el ement 

->  ( ' <name> ' | Arrayjdcl n ) ; 

External__stmt 

->  Ext  'EOS'  ->  '<label>'  Ext  'EOS'  =>  'labeled'; 

Ext 

->  'EXTERNAL'  ('<name>'  list  ',')  =>  'external'; 

Data_stmt 

->  Data  'EOS'  ->  '<label>'  Data  'EOS'  =>  'labeled'; 

Data 

->  'DATA'  (Data  pair  list  ',')  =>  'data'; 

Data  _pa ir 

->  Declaratorjist  '/'  Datajist  '/'  =>  'pair'; 

Declarator_l  ist 

->  Declarator  list  ' =>  'declarators'; 
Data_l  ist 

->  Data  item  list  ','  =>  'dataitems'; 


Data_item 

->  ( ' <Hconst> ' I ' <Lconst> ' 1 Data_number ) 

->  '<Iconst>'  '*'  ( '<Hconst>' I '<Lconst>' !Data_number)  =>  '*' 
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Data_nuinber 
->  Complex_const 
->  Number 

->  '+'  Number  ->  Number  =>  'neg'; 

Complexjconst 

->  '('  Cconstjelement  Cconstjelement  ')'  =>  'cconst'; 

Cconst_e1ement 
->  '<Rconst>' 

->  '+'  '<Rconst>’  '<Rconst>'  =>  'neg'; 

Number 

->  '<Iconst>'  ->  '<Rconst>'  ->  '<DPconst>'; 
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FT 


# FORTRAN  format  statements: 

Forma t_stmt 

->  '<label>'  Fmt  'EOS'  =>  'labeled'; 

Fmt 

->  'FORMAT'  '('  Slash*  ')'  =>  'format' 

->  'FORMAT'  '('  Slash*  (Field  list  Sep)  Slash*  ')'  =>  'format'; 

Slash 
->  ■/'; 

Sep 

->  ' ,'  ->  Slash  +; 

Field 

->  Basic_field  ->  Groupl 

Basic_field 

->  '<Hconst>'  ->  '<Fmtfld>'; 

Groupl 

->  Repeat_count  Fmtl  =>  'group'; 

Fmtl 

->  '('  Slash*  ')'  =>  'format'; 

->  '('  Slash*  (Fieldl  list  Sep)  Slash*  ')'  =>  'format'; 

Fieldl 

->  Basic_field  ->  Group2 

Group2 

->  Repeat_count  Fmt2  =>  'group'; 

Fmt  2 

->  '('  Slash*  ')'  =>  'format' 

->  '('  Slash*  (Basic  field  list  Sep)  Slash*  ')'  =>  'format'; 

Repeatjcount 

->  '<Iconst>'  ->  =>  'one'; 
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# FORTRAN  function_or_array  statements: 
Functi on_or_array 


-> 

Foa  'EOS'  ->  '<label>'  Foa  ' 

'EOS' 

= > 

'labeled'; 

Foa 

-> 

'<name>'  '('  Exprn_list  ')'  Expression 

= > 

' foa ' ; 

# FORTRAN  executable  statements: 

Executable_stmt 

-> 

Exec  'EOS'  ->  '<label>'  Exec 

'EOS' 

= > 

'labeled' ; 

Exec 

-> 

'DO'  '<label>'  '<name>'  '='  Do_parameters 

= > 

'do' 

-> 

'LOGIF'  '('  Logical_exprn  ')'  Basic_stmt 

= > 

'logif 

-> 

'LOGIF'  '('  Paren_name  ')'  Basic_stmt 

'logif 

-> 

Basic -stmt; 

Do_parameters 

-> 

Integer  ','  Integer 

= > 

1 1 

-> 

Integer  ','  Integer  ','  Integer 

= > 

1 1 . 

> > 

Basic_stmt 

-> 

'<name>'  '='  Expression 

= > 

' becomes ' 

-> 

'ASSIGN'  '<label>'  'TO'  '<name>' 

= > 

'assign' 

-> 

'GOTO'  '<label>' 

= > 

'goto' 

-> 

'GOTO'  '('  Labeljist  ')'  '<name>' 

= > 

'compgo' 

-> 

'GOTO'  '<name>'  ','  '('  Labeljist  ')' 

= > 

'assigngo' 

-> 

'ARITHIF'  '('  Arith_exprn  ')'  '<label>’ 

'<label>'  ','  '<label>' 

= > 

'arithif 

-> 

'CALL'  '<name>'  Call_args 

= > 

'call ' 

-> 

'RETURN' 

= > 

' return ' 

-> 

'CONTINUE' 

= > 

'continue' 

-> 

'STOP' 

= > 

'stop' 

-> 

'STOP'  '<0ctconst>' 

= > 

'stop' 

-> 

'PAUSE' 

= > 

'pause' 

-> 

'PAUSE'  '<Octconst>' 

'pause' 

-> 

'REWIND'  Integer 

= > 

' rewi nd ' 

.1  - 
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->  'BACKSPACE'  Integer 

=> 

'backspace' 

->  'ENDFILE'  Integer 

= > 

'endfile' 

->  'READ'  '('  Integer  Fnnt  ')’  Possible_IO_l ist 

= > 

'read' 

->  'WRITE'  '('  Integer  Form  ’)'  PossibIe_IO_I  ist 

= > 

'write' 

->  'WRITE'  '('  Integer  FormjDlaceholder  ')'  I0_list 

= > 

'write' ; 

LabeI_I  ist 

->  '<label>’  list  ',' 

= > 

1 1 • 

» » 

Can_args 

-*> 

1 1 

9 

->  '('  ( '<Hconst>' 1 Expression)  list  ','  ')' 

Frmt 

->  Fonn_pIaceholder 
->  Form; 

= > 

1 1 • 

Form 

->  ( '<Iabel>' 1 '<name>' ) 

= > 

'fmt'; 

Form_p1aceholder 

-> 

= > 

' fmt ' ; 

Possible_10_list 

-> 

->  I0_list; 

I0_1  i St 

->  fNamed  valuel'C  Named  value  rlist  ')’ 

= > 

'iolist' 

I'C  Iteration  list  ')'  ) rlist 

Iteration_list 

->  (Named  valuel'C  Named  value  rlist  ')' 

= > 

'iolist' ; 

I'C  Iterationjist  ')'  ) Do-specification 

->  (Named  valuel'C  Named  value  rlist  ','  ')' 

= > 

'iterate' 

I'C  Iterationjist  ')'  ) Iterationjist 

= > 

'iterate' ; 

Do_specification 

->  '<name>'  '='  Do_parameters 

= > 

'do_spec' ; 
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# FORTRAN  expressions: 

Expression 

->  (Logical_exprn|Arith_exprn) ; 

Logical_t/vprn 
->  L_term 

->  Logicaljexprn  'OR'  (L_termlParen_name) 
->  Parenjiame  'OR'  (L_term|Paren_name) 

L_terfn 
->  L_factor 

->  L_term  'AND'  (L_f actor lParen_name) 

->  Paren_name  'AND'  (L_factor|Paren_name) 

L_factor 
->  L_primary 

->  'NOT'  (Lj3rimary|Paren_name) 

LjDrimary 
->  '<Lconst>' 

->  Relationaljexpm 
->  '('  Logical_exprn  ')'; 

Relationaljexpm 

->  Arithjexprn  '<Relop>'  Arithjexprn 

Arith_exprn 
->  Paren_natne 
->  Simple_AE; 

Simple_AE 
->  A_terni 

->  Simple_AE  '+'  (A_term|Paren_name) 

->  Simple_AE  '-'  (A_tenn|Paren_name) 

->  Paren_name  ' + ' (A_tertn|Paren_name) 

->  Paren_name  '-'  (A_term|Paren_name) 

->  '+'  (A_term| Parenjiame) 

- > ' - ' ( A_term | Pa  ren_name ) 


=>  'or' 
=>  'or'; 


=>  'and' 
=>  'and'; 


=>  ' not ' ; 


=>  'relop'; 


=>  'plus' 
=>  'minus' 
=>  'plus' 
=>  'minus' 

=>  'neg'; 


n 
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A_tenn 
->  A__f  actor 

->  A_term  (A_factor lParen_name) 

->  A_tenn  (A_factor |Paren_nani.O 
->  Paren_name  '*'  (A_factor|Paren_name) 

->  Paren_name  '/'  (A_factor!Paren_natr,e) 

A_factor 
->  A_primary 

->  AjJrimary  '**'  (A_pn‘mary |Paren_name) 
->  Paren_name  ’**’  (Ajsrimary iParenjiame) 

Ajarimary 
->  Number 
->  Comp1ex_const 
->  '('  Simple_AE  ' ) ‘ ; 

Parenjname 

->  Named_value 

->  '('  Paren_name  ')' 

Named_value 
->  '<name>' 

->  '<name>'  '('  Exprn_list  ')' 

Exprn_l  ist 
->  Expression  1 ist 


=>  'mult' 
-->  'div' 
=>  'mult' 
=>  'div'; 


:>  'pwr' 


=>  'pwr'; 


=>  'parens  ; 


=>  'apply'; 


t I . 

9 9 


end  Fortran_compilation_unit 
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III.  Discussion 

The  FORTRAN  parser  qrammar  was  derived  in  two  steps.  First,  a 
straightforward  grammar  was  written  to  capture  ANSI  standard  FORTRAN, 
without  regard  to  the  LALR(l)  property.  The  resulting  grammar  was  then 
modified  to  attain  LALR(1).  A discussion  of  issues  relating  to  these 
steps  is  given  in  the  four  sub-sections  below. 

Information  sources 

The  document  entitled  "USA  Standard  FORTRAN,  X3.9  - 1966"  [1]  served 
as  the  basic  reference  for  syntactic  structure.  Syntax  charts  developed 
by  Mcllroy  [8]  were  later  used  to  verify  that  the  initial  grammar  was 
a "correct"  interpretation  of  the  standard. 

Completeness 

Some  aspects  of  FORTRAN  syntax  are  not  easily  specified  in  a parser 
grammar.  The  following  syntax  rules  must  be  processed  after  the  pars- 
ing phase  (references  to  the  standard  are  shown  in  parentheses). 

1)  The  integer  constant  zero  may  not  appear 

a)  as  a declarator  subscript  in  an  array  declaration 

(7. 2. 1.1),  or 

b)  as  a data  item  replication  factor  in  a DATA  initializa- 

tion statement  (7.2.2),  or 

c)  as  a parameter  in  a DO  statement  (7. 1.2.8). 

2)  A statement  label  must  be  greater  than  zero  (3.4). 

3)  Statement  function  definitions  must  precede  the  first  execut- 

able statement  of  the  given  program  unit  (9.1.1)  (Some  state- 
ment function  definitions  cannot  be  syntactically  dis- 
tinguished from  assignment  statements  in  which  an  array 
element  appears  to  the  left  of  the  equals  sign). 

4)  The  dummy  arguments  of  a statement  function  definition  must  oe 

distinct  variable  names  (8.1.1). 

5)  The  expression  appearing  to  the  right  of  the  equals  sign  in 

a statement  function  definition  may  only  contain 

a)  Non-Hollerith  constants 

b)  Variable  references 

c)  Int'^insic  function  references 
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d)  References  to  previously  defined  statement  functions 

e)  External  function  references 

Note  that  array  element  references  are  excluded.  (8.1.1) 

6)  A RETURN  statement  may  not  appear  in  the  main  program  (7. 1.2. 5). 

7)  Since  arrays  must  be  defined  with  1,  2,  or  3 dimensions  (7. 2.1.1), 

array  elements  must  be  specified  with  no  more  than  3 sub- 
scripts (5. 1.3. 2). 

8)  Array  element  subscripts  must  be  written  as  one  of  the  follow- 

ing constructs 

c * V + k 
c * V - k 
c * V 

V + k 

V - k 
V 

k 

where  c and  k are  integer  constants  and  v is  an  integer  vari- 
able reference  (5. 1.3. 3). 

9)  The  number  of  subscripts  of  an  array  element  in  an  EQUIVALENCE 

statement  must  correspond  to  the  dimensionality  of  the  array 
declarator  or  must  be  one  (7. 2. 1.4). 

Scanner  Interface 

A scanner  interface  may  be  specified  by  listing  all  of  the  token 
types  to  be  passed  from  scanner  to  parser,  together  with  conventions 
reaarding  the  transmission  of  subrosa  information.  Figures  2 and  3 
list  the  FORTRAN  token  types  used  in  the  parser  grammar  of  this  report. 
Although  the  choice  of  token  types  for  FORTRAN  is  generally  straight- 
forward, several  decisions  were  guided  by  more  subtle  considerations 
and  are  worthy  of  special  mention. 

The  end-of-statement  (EOS)  token  is  made  necessary  by  the  fact 
that  READ  and  WRITE  statements  need  not  contain  input/output  lists. 

For  example,  both  WRITE(6,1000)  and  WRITE (6,1 000 )Yr'1AX  are  legal  state- 
ments according  to  the  rules  of  ANSI  standard  FORTRAN.  Suppose  that 
EOS  tokens  were  not  supplied  by  the  scanner  module,  and  a FORTRAN  orogram 
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contained  the  following  statement  sequence; 


WRITE(6,1000) 
YMAX  = 1 


A problem  occurs  when  parsing  reaches  the  end  of  the  WRITE  statement:  it 
is  impossible  for  a parser  with  only  single  character  look-ahead  to  tell 
whether  the  next  token,  YMAX,  is  part  of  the  WRITE  statement  or  part  of 
the  following  assignment  statement.  The  inclusion  of  an  intervening 
EOS  token  (generated  by  the  scanner)  resolves  this  ambiguity. 

Early  versions  of  the  FORTRAN  grammar  expressed  IF  statement  syntax 
by  means  of  the  following  productions: 

Exec  ^ 'IF'  '('  Logical_exprn  ')'  Basic__stmt; 

Basic_stmt  ->■  'IF'  '('  Arith_exprn  ')'  '<label>'  ',' 

' <label > ' ' , ' ' <label > ' ; 

Unfortunately,  these  productions  are  not  LALR(l).  The  problem  occurs 
when  an  IF  statement  of  the  form  IF(A)‘*«  is  encountered.  The  parser 
cannot  decide  (with  just  single  character  look-ahead)  whether  to  reduce 
the  named  value.  A,  to  a logical  expression  or  an  arithmetic  expression. 

Discovery  of  this  problem  led  to  the  realization  that  ambiguities 
involving  named  value  appear  in  other  contexts  as  well.  Appendix  B 
gives  a complete  account  of  the  problem,  and  details  the  extensive  set 
of  expression  grammar  transformations  necessary  to  solve  it. 

The  results  of  Appendix  B add  one  more  production  to  the  description 
of  IF  statement,  but  do  not  solve  the  original  problem: 

Exec  + 'IF'  '('  Logical_exprn  ')'  Basic_stmt 
'IF'  '('  Paren_name  ')'  Basic_stmt; 

Basic_stmt  'IF'  '('  Arith_exprn  ')'  '<ldbel>'  ',' 

'<label>'  '<label>'; 

Now  when  the  parser  encounters  a Paren_name  (i.e.,  a named  value  surround- 
ed by  zero  or  more  sets  of  parentheses),  it  cannot  decide  whether  to  con- 
tinue reading  or  to  reduce  that  Paren_name  to  Arith_exprn.  Intuitively, 
the  reason  is  that  the  parser  does  not  know  which  kind  of  IF  statement 
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is  being  parsed  until  after  the  reduce  decision  has  been  made. 

The  problem  is  solved  by  providing  two  token  types  for  the  IF  key- 
word, one  for  logical  if  statements  (LOGIF)  and  one  for  arithmetic  if 
statements  (ARITHIF); 

Exec  'LOGIF'  '('  Logicaljexprn  ')'  Basic_stmt 
-<•  'lOGIF'  '('  Paren_name  ')'  Basic_stmt; 

Basic_stmt  ->  'ARITHIF'  '('  Arith_exprn  ')'  '<label>'  ',' 

'<label>’  '<label>'; 

Right  Recursion 

Recall  from  section  II  that  both  1 i st  and  rl ist  may  be  used  to  ex- 
press the  syntax  of  ordinary  lists  of  objects.  Their  only  distinguish- 
ing feature  is  that  list  results  in  a left  recursive  expansion,  while 
rl ist  results  in  a right  recursive  one.  Although  Figure  6 clearly  demon- 
strates that  left  recursion  is  preferred  in  LR  parsing  because  it  results 
in  a smaller  parse  stack,  right  recursion  is  sometimes  necessary  to 
achieve  the  LALR(l)  property. 

The  FORTRAN  standard  describes  the  syntax  of  input/output  lists  as 
f ol 1 ows : 

"A  list  is  a simple  list,  a simple  list  enclosed  in  parentheses, 
a DO- implied  list,  or  two  lists  separated  by  a comma.  Lists  are 
formed  in  the  following  manner.  A simple  list  is  a variable  name,  an 
array  element  name,  or  an  array  name,  or  two  simple  lists  separated 
by  a comma.  A DO-implied  list  is  a list  followed  by  a comma  and  a 
DO-implied  specification,  all  enclosed  in  parentheses." 

This  complex  (and  confusing!)  structure  may  be  expressed  by  the  follow- 
ing grammar  productions,  where  the  nonterminal  Named_value  stands  for 
variable  name,  array  element  name,  and  array  name,  and  Iteration_l  ist 
may  be  considered  a synonym  for  DO-implied  list: 

I0_1  ist 

-*•  (Named_value  | '('  Named_value  rl  ist  ','  ')' 

I '('  IterationJ ist  ')'  ) rl ist  ','  ; 

Iteration_l  ist 

->  (Named_value  ] '('  Named  value  rlist  ','  ')' 
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(a) 

With  left  recursive 
grammar  production 
Ext  — > 'EXTERNAL'  '<name>'  list 


With  right  recursive 
grammar  production 
Ext  — > 'EXTERNAL'  '<name>'  rlist 


Figure  6 

Parse  stack  just  before  the  name 
C is  read  during  parse  of  the  FORTRAN  statement 
EXTERNAL  A,B,C 


UJ 
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1 '('  Iteration^!  ist  ')'  ) Do_specificdtion 

(Named_va1ue  | '('  Namecl_value  r1  ist  ')' 

I '('  Iteration_l  ist  ')'  ) Iteration_l  ist ; 

To  see  that  right  recursion  is  necessary,  consider  a parse  of  the 
statement 

WRITE(6,iOOO)  (A(I),I,I=1 ,5) 

Receipt  of  the  opening  parenthesis  of  the  I/O  list  indicates  to  the 
parser  that  either  a named  value  list  or  an  iteration  list  follows.  If 
left  recursion  has  been  used  to  specify  named  value  lists,  then  a 
read/reduce  conflict  occurs  after  receipt  of  the  next  token,  represent- 
ing array  element  A(I).  The  parser  cannot  decide  whether  to  immediately 
reduce  this  token  to  Named_value_l  ist,  or  to  continue  reading  because 
an  iteration  list  is  involved.  The  basic  problem,  then,  is  that  the 
parser  cannot  distinguish  between  named  value  lists  and  iteration  lists 
until  either  a closing  parenthesis  is  read  (indicating  the  former),  or 
the  receipt  of  a FORTRAN  equals  sign  indicates  that  a Do-specification 
is  being  parsed.  The  use  of  right  recursion  (rl ist)  in  specifying  named 
value  lists  solves  the  problem  by  delaying  all  reductions  until  the  entire 
I/O  list  has  been  seen. 
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Appendix  B 

Expression  Grammar  Transformations 

A straightforward  grammar  for  FORTRAN  expressions  is  shown  in 
Figure  Bl.  The  following  abbreviations  are  used: 

E - expression 
LE  - logical  expression 
LT  - logical  term 
LF  - logical  factor 
LP  - logical  primary 

<Lconst>  - logical  constant  (.TRUE.,  .FALSE.) 

RE  - relational  expression 
AE  - arithmetic  expression 
AT  - arithmetic  term 
AF  - arithmetic  factor 
AP  - arithmetic  primary 

N - number  (integer,  real,  or  double  precision  constant) 

CC  - complex  constant 

NV  - named  value  (simple  variable,  array  element,  or  function  call) 

Notice  that  this  grammar  allows  a named  value  (NV)  to  appear  in  any 
context  where  either  a logical  expression  (LE)  or  an  arithmetic  expression 
(AE)  is  required.  For  example,  in  the  FORTRAN  logical  if  statement 

IF  (X)  GO  TO  10 

the  named  value  X plays  the  role  of  a logical  expression,  while  in  the 
arithmetic  if  statement 

IF  (Y)  20,  30,  40 

Y takes  the  part  of  an  arithmetic  expression. 

A serious  problem  occurs,  however,  when  a named  value  is  asked  to 
fill  the  role  of  a general  expression  (E).  For  example,  the  right-hand- 
side  of  an  assignment  statement  simply  requires  an  expression;  either 
logical  or  arithmetic  will  do.  When  a named  value  is  encountered  in 
this  context,  the  parser  does  not  know  how  to  reduce  that  named  value  to 
expression:  should  it  first  reduce  to  logical  primary  (LP)  and  then  con- 
tinue with  reductions  involving  logical  entities,  or  should  it  first 
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Figure  Bl 

A simple  grammar  for  FORTRAN  expressions 
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reduce  to  arithmetic  primary  (AP)  and  take  the  "arithmetic  route"? 

Both  paths  eventually  lead  to  expression. 

As  a result,  the  grammar  shown  in  Figure  B1  is  not  LALR(l).  Intui- 
itively,  the  reason  is  that  type  attributes  of  named  values  are  not  known 
during  parsing. 

One  possible  solution  to  the  type  distinction  problem  is  to  combine 
the  separate  sub-grammars  for  logical  and  arithmetic  expressions  into  a 
si;  "'le  grammar,  similar  to  the  approach  taken  in  Pascal  [7].  The  result- 
ing grammar  is  shown  in  Figure  B2.  Extra  processing  will  now  be  required 
during  later  phases  of  analysis  to  verify  that: 

1)  expressions  do  not  inappy'opriately  contain  both  logical  and 
arithmetic  operators,  and 

2)  logical  and  arithmetic  expressions  correctly  appear  in  contexts 
where  they  are  required. 

Unfortunately,  the  grammar  of  Figure  B2  is  not  LALR(l)  either.  The 
production  required  for  relational  expression  (RE)  has  caused  the  non- 
terminal E to  become  both  left-  and  right-recursive.  Pascal  avoids  this 
problem  by  placing  the  syntactic  description  for  relational  expression 
"higher"  in  the  grammar.  This  arrangement  has  a side  effect  of  requiring 
parentheses  in  logical  expressions  of  the  form: 

(X<5)  AND  (Y>3). 


Since  the  1966  ANSI  Standard  clearly  indicates  that  such  parentheses 
are  not  required  in  FORTRAN,  it  is  not  possible  to  similarly  modify  the 
grammar  of  Figure  B2.  Thus,  the  combined  sub-grammar  approach  to  attain- 
ing the  LALR(l)  property  must  be  abandoned. 

Consider  again  the  simple  grammar  of  Figure  Bl.  Another  possible 
solution  is  to  remove  one  of  the  productions  LP  -*■  NV  or  AP  -*■  NV.  This  may 
be  accomplished  by  use  of  the  well-known  "back  substitution"  technique, 
a process  which  guarantees  that  the  language  being  generated  does  not 
change  (see  Lemma  4.2  in  [5]).  LP  ->■  NVis  arbitrarily  chosen  for  removal. 
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E ^ T 
E ' T 
E T 

E E ’ + ' T 
E ^ E T 
E ^ E ’or'  T 
T ^ F 

T ->  T F 
T ^ T 7'  F 
T -V  T 'and'  F 
F P 

F ->  p ■**'  p 

F -»•  'not'  P 
P M 
P -V  CC 

p E ')• 

P ' <Lconst> ' 

P RE 
P NV 

RE  ^ £ '<Re1op>'  E 


Figure  R2 

Grammar  with  logical  and  arithmetic  expressions  combined. 

The  production  for  RE  causes  E to  become  both  left-  and  right 
recursive. 
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Notice  that  the  second  of  these  is  another  of  the  form  a > NV  (where  a is 
some  non-terminal)  and  must  therefore  be  removed.  Figure  B3  shows  the 
grammar  that  results  after  four  such  applications  of  back  substitution. 

The  table  below  indicates  the  production  removed  at  each  step: 

STEP  PRODUCTION  REMOVED 

1 LP  NV 

2 LF  ^ NV 

3 LT  NV 

4 LF  - NV 

The  important  consequence  of  this  action  has  been  to  remove  LP  >■  NV 
in  favor  of  E ->■  NV.  On  the  surface  it  appears  that  a similar  removal  of 
AP  -<•  NV  will  solve  the  problem,  since  E ->■  NV  would  then  be  the  only  re- 
maining production  of  the  form  a ->-  NV.  However,  the  back  substitution 
designed  to  eliminate  LP  ^ NV  has  uncovered  a deeper  problem.  Consider, 
for  example,  the  FORTRAN  assignment  statement  X = (Y),  in  which  a paren- 
thesized named  value  appears  in  a context  where  a general  expression  is 
required.  There  are  still  two  possibilities  for  reduction  to  E: 

1)  By  using  LP  -»■  '('  NV  ')'asthe  first  step  in  the  reduction,  or 

2)  By  first  reducing  NV  to  AF  (using  AP  ->•  NV  as  a first  step)  and 
then  by  reducing  '( ' AE  ')'  to  E (via  the  production  AP  > '{'  AE  ')'). 

It  is  now  clear  that  the  problem  with  the  simple  grammar  of  Figure  R1 
is  not  just  one  of  reducing  NV  to  E,  but  involves  the  reduction  of  paren- 
thesized NV's  as  well,  where  nesting  levels  may  be  arbitrarily  deep! 

With  this  in  mind,  imagine  the  effects  of  continuing  with  more 
rounds  of  back  substitution.  Each  round  begins  with  elimination  of 
a production 

LP  ^ •('...'('  NV  ‘ )’•••’)', 

where  the  depth  of  nesting  has  increased  by  one  from  the  previous  round. 

The  grammar  grows  larger  and  larger,  but  a convenient  pattern  has 
emerged:  newly  added  productions  are  similar  to  those  seen  in  Figure  B3, 
but  with  ever-increasing  sets  of  parentheses  surrounding  those  posi- 
tions where  an  NV  appears. 

If  this  process  were  to  continue  indefinitely,  a new  non-ter- 
minal could  be  introduced  to  take  advantage  of  this  pattern; 
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EX  ^ 'LOGIF'  '('  NV  ')'  BS 


Figure  B3 

Expression  grammar  after  four  steps  of  back  substitution. 

The  additional  production  required  for  logical-if  statement 
is  also  shown,  with  abbreviations: 

EX  - executable  statement 
'LOGIF'  - IF  token  for  logical-if  statement 

BS  - basic  statement  (any  executable  except  00  or 
logical-if) 
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PNV  ->  NV 

PNV  ' ( ’ PNV  ' ) ' . 

This  non-terminal,  pronounced  "parenthesized  named  value",  captures 
the  notion  of  a named  value  enclosed  by  zero  or  more  sets  of  paren- 
theses. It  may  be  used  to  "collapse"  similar  productions,  thereby 
shortening  the  grammar  without  changing  the  language  generated. 

Figure  B4  shows  the  simplified  grammar  which  results.  Notice 
that  a potentially  troublesome  production,  of  the  form 
Lp  -V  ^\|  has  been  safely  dropped  from  the  grammar 

since  after  a sufficiently  large  number  of  back  substitutions  it  repre- 
sents a logical  primary  containing  so  many  parentheses  that  there  is 
not  room  to  fit  them  all  into  a standard  FORTRAN  statement  (limited 
to  19  continuation  lines). 

When  a similar  process  is  applied  to  arithmetic  expressions,  as 
shown  in  Figure  B5,  the  original  problem  finally  disappears.  Stand- 
alone named  values  (possibly  enclosed  in  parentheses)  may  be  unambig- 
uously reduced,  first  to  PNV  and  then  to  E.  Named  values  appearing 
in  more  complicated  expressions  are  recognized  by  the  productions 
which  were  added  during  back  substitution. 

Although  the  grammar  of  Figure  B5  satisfies  the  LALR(l)  property, 
it  is  convenient  to  simplify  it  by  means  of  the  followino  steps: 

1)  Replace  all  occurrences  of  AE  in  the  grammar  to  SAE  (simple 
arithmetic  expression).  The  productions  for  E become; 

E LE 
E ^ SAE 
E ->  PNV 


2)  Introduce  a new  "intermediate"  non-terminal  AE,  such  that; 

E ^ LE 
E ^ AE 
AE  ^ SAE 
AE  PNV 

3)  Use  the  new  non-terminal  to  collapse  productions  involving 
basic  statement  (BS)  and  relational  expression  (RE). 

The  final  LALR(l)  grammar  for  FORTRAN  expressions  is  shown  in 
Figure  B6. 
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Figure  B4 

Grarmar  after  many  steps  of  back  substitution  and  sub- 
sequent simplification  via  the  introduced  nonterminal 
PNV. 
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LE  •>  LT 

LE  LE  'or'  LT 
LE  ^ LE  'or'  PNV 
LE  - PNV  'or'  LT 
LE  ^ PNV  'or'  PNV 
LT  ^ LF 
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LT  - LT  'and'  PNV 
LT  -V  PNV  'and'  LF 
LT  ^ PNV  'and'  PNV 
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RE  ->■  AE  '<Re1op>'  AE 
RE  -V  AE  '<Re1op>'  PNV 
RE  PNV  '<Relop>'  AE 
RE  + PNV  '<Relop>'  PNV 

Figure  BS 

Resulting  grammar  after  back  substitution  and  simplification  in  the 
arithmetic  expression  sub-grammar.  Additional  productions  required 
for  basic  statement  (BS)  and  relational  expression  (RE)  are  also  shown. 
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Figure  B6 

Final  LALR(1 ) grammar  for  FORTRAN  expressions 
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